To appear in AAAI-96 Lazy Decision Trees

نویسندگان

Jerome H. Friedman

Ron Kohavi

Yeogirl Yun

چکیده

Lazy learning algorithms, exempli ed by nearestneighbor algorithms, do not induce a concise hypothesis from a given training set; the inductive process is delayed until a test instance is given. Algorithms for constructing decision trees, such as C4.5, ID3, and CART create a single \best" decision tree during the training phase, and this tree is then used to classify test instances. The tests at the nodes of the constructed tree are good on average, but there may be better tests for classifying a speci c instance. We propose a lazy decision tree algorithm|LazyDT|that conceptually constructs the \best" decision tree for each test instance. In practice, only a path needs to be constructed, and a caching scheme makes the algorithm fast. The algorithm is robust with respect to missing values without resorting to the complicated methods usually seen in induction of decision trees. Experiments on real and arti cial problems are presented. Introduction Delay is preferable to error. |Thomas Je erson (1743-1826) The task of a supervised learning algorithm is to build a classi er that can be used to classify unlabelled instances accurately. Eager (non-lazy) algorithms construct classi ers that contain an explicit hypothesis mapping unlabelled instances to their predicted labels. A decision tree classi er, for example, uses a stored decision tree to classify instances by tracing the instance through the tests at the interior nodes until a leaf containing the label is reached. In eager algorithms, the inductive process is attributed to the phase that builds the classi er. Lazy algorithms (Aha to appear), however, do not construct an explicit hypothesis, and the inductive process can be attributed to the classi er, which is given access to the training set, possibly preprocessed (e.g., data may be normalized). No explicit A longer version of this paper is available at http://robotics.stanford.edu/~ronnyk mapping is generated and the classi er must use the training set to map each given instance to its label. Building a single classi er that is good for all predictions may not take advantage of special characteristics of the given test instance that may give rise to an extremely short explanation tailored to the speci c instance at hand (see Example 1). In this paper, we introduce a new lazy algorithm| LazyDT|that conceptually constructs the \best" decision tree for each test instance. In practice, only a path needs to be constructed, and a caching scheme makes the algorithm fast. Practical algorithms need to deal with missing values, and LazyDT naturally handles them without resorting to the complicated methods usually seen in induction of decision trees (e.g., sending portions of instances down di erent branches or using surrogate features). Decision Trees and Their Limitations Top down algorithms for inducing decision trees usually follow the divide and conquer strategy (Quinlan 1993; Breiman et al. 1984). The heart of these algorithms is the test selection, i.e., which test to conduct at a given node. Numerous selection measures exist in the literature, with entropy measures and the Gini index being the most common. We now detail the entropy-based selection measure commonly used in ID3 and its descendants (e.g., C4.5) because the LazyDT algorithm uses a related measure. We will then discuss some of the limitations of eager decision tree algorithms and motivate our lazy approach. Test Selection in Decision Trees To describe the entropy-based selection measure, we follow the notation of Cover & Thomas (1991). Let Y be a discrete random variable with range Y; the entropy of Y , sometimes called the information of Y ,

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Lazy Decision Trees

Lazy learning algorithms, exemplified by nearestneighbor algorithms, do not induce a concise hypothesis from a given training set; the inductive process is delayed until a test instance is given. Algorithms for constructing decision trees, such as C4.5, ID3, and CART create a single “best” decision tree during the training phase, and this tree is then used to classify test instances. The tests ...

متن کامل

Boosting Lazy Decision Trees

This paper explores the problem of how to construct lazy decision tree ensembles. We present and empirically evaluate a relevancebased boosting-style algorithm that builds a lazy decision tree ensemble customized for each test instance. From the experimental results, we conclude that our boosting-style algorithm significantly improves the performance of the base learner. An empirical comparison...

متن کامل

Batched Lazy Decision Trees

We introduce a batched lazy algorithm for supervised classification using decision trees. It avoids unnecessary visits to irrelevant nodes when it is used to make predictions with either eagerly or lazily trained decision trees. A set of experiments demonstrate that the proposed algorithm can outperform both the conventional and lazy decision tree algorithms in terms of computation time as well...

متن کامل

Combining two lazy learning methods for classification and knowledge discovery

The goal of this paper is to construct a classifier for diagnosing malignant melanoma. We experimented with two lazy learning methods, $k$-NN and \textsf{LID}, and compared their results with the ones produced by decision trees. We performed this comparison because we are also interested on building a domain model that can serve as basis to dermatologists to propose a good characterization of e...

متن کامل